A Machine Learning approach to Classify web documents of freelancing and remote work in IT field
نویسنده
چکیده
Vertical Search Engine (VSE) indexes only those pages which are relevant to the predefined subject. It is necessary to determine which kind of information in a web page is relevant. This paper addresses the issue of relevancy of a web page which has the content of freelance and remote work postings. For this, we propose a machine learning approach which checks the content of the web page. It extracts keyword occurrences within a web page. It then looks for relevant keywords and classifies the web page as positive if it contains remote work/freelance job postings. This approach incorporates incremental training to improve accuracy. Three experiments were designed by changing the number of relevant web pages given for training. Performance is measured using accuracy, precision, and recall. Our results suggest that only a few relevant web pages need to be given for training to get a high accuracy.
منابع مشابه
Georeferencing Semi-Structured Place-Based Web Resources Using Machine Learning
In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georefer...
متن کاملA statistical approach to classify Skype traffic
Abstract- Skype is one of the most powerful and high-quality chat tools that allows its users to use of many services such as: transferring audio, sending messages, video conferencing and audio for free. Skype traffic has a lot of Internet traffic. Hence, Internet service providers need to identify traffic to do the quality of service and network management. On the other hand, Skype developers ...
متن کاملMachine learning algorithms in air quality modeling
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملHigh performance of the support vector machine in classifying hyperspectral data using a limited dataset
To prospect mineral deposits at regional scale, recognition and classification of hydrothermal alteration zones using remote sensing data is a popular strategy. Due to the large number of spectral bands, classification of the hyperspectral data may be negatively affected by the Hughes phenomenon. A practical way to handle the Hughes problem is preparing a lot of training samples until the size ...
متن کامل